Dataset statistics
| Dataset A | Dataset B | |
|---|---|---|
| Number of variables | 12 | 12 |
| Number of observations | 446 | 446 |
| Missing cells | 443 | 426 |
| Missing cells (%) | 8.3% | 8.0% |
| Duplicate rows | 0 | 0 |
| Duplicate rows (%) | 0.0% | 0.0% |
| Total size in memory | 45.3 KiB | 45.3 KiB |
| Average record size in memory | 104.0 B | 104.0 B |
Variable types
| Dataset A | Dataset B | |
|---|---|---|
| Numeric | 5 | 5 |
| Categorical | 4 | 4 |
| Text | 3 | 3 |
| Dataset A | Dataset B | |
|---|---|---|
Age has 89 (20.0%) missing values | Age has 83 (18.6%) missing values | Missing |
Cabin has 353 (79.1%) missing values | Cabin has 342 (76.7%) missing values | Missing |
PassengerId has unique values | PassengerId has unique values | Unique |
Name has unique values | Name has unique values | Unique |
SibSp has 299 (67.0%) zeros | SibSp has 296 (66.4%) zeros | Zeros |
Parch has 340 (76.2%) zeros | Parch has 329 (73.8%) zeros | Zeros |
Fare has 7 (1.6%) zeros | Fare has 6 (1.3%) zeros | Zeros |
Reproduction
| Dataset A | Dataset B | |
|---|---|---|
| Analysis started | 2023-12-07 16:08:43.182664 | 2023-12-07 16:08:46.856741 |
| Analysis finished | 2023-12-07 16:08:46.855711 | 2023-12-07 16:08:50.740655 |
| Duration | 3.67 seconds | 3.88 seconds |
| Software version | ydata-profiling v0.0.dev0 | ydata-profiling v0.0.dev0 |
| Download configuration | config.json | config.json |
PassengerId
Real number (ℝ)
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 446 | 446 |
| Distinct (%) | 100.0% | 100.0% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Infinite | 0 | 0 |
| Infinite (%) | 0.0% | 0.0% |
| Mean | 443.47534 | 437.16592 |
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 3 | 1 |
| Maximum | 889 | 890 |
| Zeros | 0 | 0 |
| Zeros (%) | 0.0% | 0.0% |
| Negative | 0 | 0 |
| Negative (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
Quantile statistics
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 3 | 1 |
| 5-th percentile | 37.25 | 49.25 |
| Q1 | 235.25 | 221.5 |
| median | 439 | 430.5 |
| Q3 | 658.5 | 654.75 |
| 95-th percentile | 841.5 | 841.75 |
| Maximum | 889 | 890 |
| Range | 886 | 889 |
| Interquartile range (IQR) | 423.25 | 433.25 |
Descriptive statistics
| Dataset A | Dataset B | |
|---|---|---|
| Standard deviation | 253.59671 | 251.97954 |
| Coefficient of variation (CV) | 0.5718395 | 0.57639338 |
| Kurtosis | -1.1294502 | -1.1453682 |
| Mean | 443.47534 | 437.16592 |
| Median Absolute Deviation (MAD) | 211 | 216 |
| Skewness | -0.02666757 | 0.034909975 |
| Sum | 197790 | 194976 |
| Variance | 64311.293 | 63493.689 |
| Monotonicity | Not monotonic | Not monotonic |
| Value | Count | Frequency (%) |
| 604 | 1 | 0.2% |
| 666 | 1 | 0.2% |
| 220 | 1 | 0.2% |
| 573 | 1 | 0.2% |
| 308 | 1 | 0.2% |
| 707 | 1 | 0.2% |
| 166 | 1 | 0.2% |
| 800 | 1 | 0.2% |
| 727 | 1 | 0.2% |
| 463 | 1 | 0.2% |
| Other values (436) | 436 |
| Value | Count | Frequency (%) |
| 415 | 1 | 0.2% |
| 449 | 1 | 0.2% |
| 92 | 1 | 0.2% |
| 165 | 1 | 0.2% |
| 107 | 1 | 0.2% |
| 670 | 1 | 0.2% |
| 592 | 1 | 0.2% |
| 772 | 1 | 0.2% |
| 409 | 1 | 0.2% |
| 853 | 1 | 0.2% |
| Other values (436) | 436 |
| Value | Count | Frequency (%) |
| 3 | 1 | |
| 4 | 1 | |
| 5 | 1 | |
| 6 | 1 | |
| 8 | 1 | |
| 9 | 1 | |
| 12 | 1 | |
| 14 | 1 | |
| 15 | 1 | |
| 16 | 1 |
| Value | Count | Frequency (%) |
| 1 | 1 | |
| 2 | 1 | |
| 4 | 1 | |
| 6 | 1 | |
| 9 | 1 | |
| 10 | 1 | |
| 11 | 1 | |
| 12 | 1 | |
| 19 | 1 | |
| 23 | 1 |
| Value | Count | Frequency (%) |
| 1 | 1 | |
| 2 | 1 | |
| 4 | 1 | |
| 6 | 1 | |
| 9 | 1 | |
| 10 | 1 | |
| 11 | 1 | |
| 12 | 1 | |
| 19 | 1 | |
| 23 | 1 |
| Value | Count | Frequency (%) |
| 3 | 1 | |
| 4 | 1 | |
| 5 | 1 | |
| 6 | 1 | |
| 8 | 1 | |
| 9 | 1 | |
| 12 | 1 | |
| 14 | 1 | |
| 15 | 1 | |
| 16 | 1 |
Survived
Categorical
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 2 | 2 |
| Distinct (%) | 0.4% | 0.4% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
| 0 | |
|---|---|
| 1 |
| 0 | |
|---|---|
| 1 |
Length
| Dataset A | Dataset B | |
|---|---|---|
| Max length | 1 | 1 |
| Median length | 1 | 1 |
| Mean length | 1 | 1 |
| Min length | 1 | 1 |
Characters and Unicode
| Dataset A | Dataset B | |
|---|---|---|
| Total characters | 446 | 446 |
| Distinct characters | 2 | 2 |
| Distinct categories | 1 | 1 ? |
| Distinct scripts | 1 | 1 ? |
| Distinct blocks | 1 | 1 ? |
Unique
| Dataset A | Dataset B | |
|---|---|---|
| Unique | 0 | 0 ? |
| Unique (%) | 0.0% | 0.0% |
Sample
| Dataset A | Dataset B | |
|---|---|---|
| 1st row | 0 | 1 |
| 2nd row | 0 | 1 |
| 3rd row | 0 | 1 |
| 4th row | 0 | 1 |
| 5th row | 1 | 1 |
Common Values
| Value | Count | Frequency (%) |
| 0 | 280 | |
| 1 | 166 |
| Value | Count | Frequency (%) |
| 0 | 273 | |
| 1 | 173 |
Length
Common Values (Plot)
Dataset A
Dataset B
| Value | Count | Frequency (%) |
| 0 | 280 | |
| 1 | 166 |
| Value | Count | Frequency (%) |
| 0 | 273 | |
| 1 | 173 |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 280 | |
| 1 | 166 |
| Value | Count | Frequency (%) |
| 0 | 273 | |
| 1 | 173 |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 446 |
| Value | Count | Frequency (%) |
| Decimal Number | 446 |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 0 | 280 | |
| 1 | 166 |
| Value | Count | Frequency (%) |
| 0 | 273 | |
| 1 | 173 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 446 |
| Value | Count | Frequency (%) |
| Common | 446 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 0 | 280 | |
| 1 | 166 |
| Value | Count | Frequency (%) |
| 0 | 273 | |
| 1 | 173 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 446 |
| Value | Count | Frequency (%) |
| ASCII | 446 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 0 | 280 | |
| 1 | 166 |
| Value | Count | Frequency (%) |
| 0 | 273 | |
| 1 | 173 |
Pclass
Categorical
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 3 | 3 |
| Distinct (%) | 0.7% | 0.7% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
| 3 | |
|---|---|
| 1 | |
| 2 |
| 3 | |
|---|---|
| 1 | |
| 2 |
Length
| Dataset A | Dataset B | |
|---|---|---|
| Max length | 1 | 1 |
| Median length | 1 | 1 |
| Mean length | 1 | 1 |
| Min length | 1 | 1 |
Characters and Unicode
| Dataset A | Dataset B | |
|---|---|---|
| Total characters | 446 | 446 |
| Distinct characters | 3 | 3 |
| Distinct categories | 1 | 1 ? |
| Distinct scripts | 1 | 1 ? |
| Distinct blocks | 1 | 1 ? |
Unique
| Dataset A | Dataset B | |
|---|---|---|
| Unique | 0 | 0 ? |
| Unique (%) | 0.0% | 0.0% |
Sample
| Dataset A | Dataset B | |
|---|---|---|
| 1st row | 3 | 3 |
| 2nd row | 3 | 2 |
| 3rd row | 3 | 1 |
| 4th row | 1 | 1 |
| 5th row | 3 | 2 |
Common Values
| Value | Count | Frequency (%) |
| 3 | 240 | |
| 1 | 103 | |
| 2 | 103 |
| Value | Count | Frequency (%) |
| 3 | 240 | |
| 1 | 112 | |
| 2 | 94 | 21.1% |
Length
Common Values (Plot)
Dataset A
Dataset B
| Value | Count | Frequency (%) |
| 3 | 240 | |
| 1 | 103 | |
| 2 | 103 |
| Value | Count | Frequency (%) |
| 3 | 240 | |
| 1 | 112 | |
| 2 | 94 | 21.1% |
Most occurring characters
| Value | Count | Frequency (%) |
| 3 | 240 | |
| 1 | 103 | |
| 2 | 103 |
| Value | Count | Frequency (%) |
| 3 | 240 | |
| 1 | 112 | |
| 2 | 94 | 21.1% |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 446 |
| Value | Count | Frequency (%) |
| Decimal Number | 446 |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 3 | 240 | |
| 1 | 103 | |
| 2 | 103 |
| Value | Count | Frequency (%) |
| 3 | 240 | |
| 1 | 112 | |
| 2 | 94 | 21.1% |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 446 |
| Value | Count | Frequency (%) |
| Common | 446 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 3 | 240 | |
| 1 | 103 | |
| 2 | 103 |
| Value | Count | Frequency (%) |
| 3 | 240 | |
| 1 | 112 | |
| 2 | 94 | 21.1% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 446 |
| Value | Count | Frequency (%) |
| ASCII | 446 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 3 | 240 | |
| 1 | 103 | |
| 2 | 103 |
| Value | Count | Frequency (%) |
| 3 | 240 | |
| 1 | 112 | |
| 2 | 94 | 21.1% |
Name
['Text', 'Text']
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 446 | 446 |
| Distinct (%) | 100.0% | 100.0% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
Length
| Dataset A | Dataset B | |
|---|---|---|
| Max length | 82 | 82 |
| Median length | 51 | 49 |
| Mean length | 27.204036 | 27.802691 |
| Min length | 12 | 12 |
Characters and Unicode
| Dataset A | Dataset B | |
|---|---|---|
| Total characters | 12133 | 12400 |
| Distinct characters | 60 | 59 |
| Distinct categories | 7 | 7 ? |
| Distinct scripts | 2 | 2 ? |
| Distinct blocks | 1 | 1 ? |
Unique
| Dataset A | Dataset B | |
|---|---|---|
| Unique | 446 | 446 ? |
| Unique (%) | 100.0% | 100.0% |
Sample
| Dataset A | Dataset B | |
|---|---|---|
| 1st row | Torber, Mr. Ernst William | Sundman, Mr. Johan Julian |
| 2nd row | Kink, Mr. Vincenz | Ridsdale, Miss. Lucy |
| 3rd row | Kilgannon, Mr. Thomas J | Goldenberg, Mr. Samuel L |
| 4th row | Graham, Mr. George Edward | Daly, Mr. Peter Denis |
| 5th row | Devaney, Miss. Margaret Delia | Trout, Mrs. William H (Jessie L) |
| Value | Count | Frequency (%) |
| mr | 261 | 14.3% |
| miss | 90 | 4.9% |
| mrs | 63 | 3.5% |
| william | 35 | 1.9% |
| master | 22 | 1.2% |
| john | 21 | 1.2% |
| henry | 20 | 1.1% |
| charles | 15 | 0.8% |
| george | 13 | 0.7% |
| james | 12 | 0.7% |
| Other values (897) | 1271 |
| Value | Count | Frequency (%) |
| mr | 253 | 13.6% |
| miss | 87 | 4.7% |
| mrs | 77 | 4.1% |
| john | 28 | 1.5% |
| william | 28 | 1.5% |
| master | 22 | 1.2% |
| george | 15 | 0.8% |
| henry | 14 | 0.8% |
| charles | 13 | 0.7% |
| james | 12 | 0.6% |
| Other values (918) | 1310 |
Most occurring characters
| Value | Count | Frequency (%) |
| 1378 | 11.4% | |
| r | 1007 | 8.3% |
| e | 898 | 7.4% |
| a | 821 | 6.8% |
| i | 680 | 5.6% |
| n | 659 | 5.4% |
| s | 650 | 5.4% |
| M | 559 | 4.6% |
| l | 514 | 4.2% |
| o | 490 | 4.0% |
| Other values (50) | 4477 |
| Value | Count | Frequency (%) |
| 1414 | 11.4% | |
| r | 1021 | 8.2% |
| a | 880 | 7.1% |
| e | 874 | 7.0% |
| i | 698 | 5.6% |
| n | 658 | 5.3% |
| s | 654 | 5.3% |
| M | 566 | 4.6% |
| l | 538 | 4.3% |
| o | 516 | 4.2% |
| Other values (49) | 4581 |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 7812 | |
| Uppercase Letter | 1834 | 15.1% |
| Space Separator | 1378 | 11.4% |
| Other Punctuation | 966 | 8.0% |
| Open Punctuation | 68 | 0.6% |
| Close Punctuation | 68 | 0.6% |
| Dash Punctuation | 7 | 0.1% |
| Value | Count | Frequency (%) |
| Lowercase Letter | 7989 | |
| Uppercase Letter | 1865 | 15.0% |
| Space Separator | 1414 | 11.4% |
| Other Punctuation | 953 | 7.7% |
| Close Punctuation | 86 | 0.7% |
| Open Punctuation | 86 | 0.7% |
| Dash Punctuation | 7 | 0.1% |
Most frequent character per category
Space Separator
| Value | Count | Frequency (%) |
| 1378 |
| Value | Count | Frequency (%) |
| 1414 |
Lowercase Letter
| Value | Count | Frequency (%) |
| r | 1007 | |
| e | 898 | |
| a | 821 | |
| i | 680 | |
| n | 659 | |
| s | 650 | |
| l | 514 | 6.6% |
| o | 490 | 6.3% |
| t | 340 | 4.4% |
| h | 265 | 3.4% |
| Other values (16) | 1488 |
| Value | Count | Frequency (%) |
| r | 1021 | |
| a | 880 | |
| e | 874 | |
| i | 698 | |
| n | 658 | |
| s | 654 | |
| l | 538 | 6.7% |
| o | 516 | 6.5% |
| t | 350 | 4.4% |
| h | 279 | 3.5% |
| Other values (16) | 1521 |
Uppercase Letter
| Value | Count | Frequency (%) |
| M | 559 | |
| A | 124 | 6.8% |
| J | 107 | 5.8% |
| H | 103 | 5.6% |
| S | 102 | 5.6% |
| C | 90 | 4.9% |
| W | 80 | 4.4% |
| E | 79 | 4.3% |
| L | 63 | 3.4% |
| B | 61 | 3.3% |
| Other values (15) | 466 |
| Value | Count | Frequency (%) |
| M | 566 | |
| A | 127 | 6.8% |
| J | 112 | 6.0% |
| H | 100 | 5.4% |
| S | 92 | 4.9% |
| C | 92 | 4.9% |
| E | 83 | 4.5% |
| B | 74 | 4.0% |
| L | 72 | 3.9% |
| R | 67 | 3.6% |
| Other values (15) | 480 |
Other Punctuation
| Value | Count | Frequency (%) |
| . | 447 | |
| , | 446 | |
| " | 66 | 6.8% |
| ' | 6 | 0.6% |
| / | 1 | 0.1% |
| Value | Count | Frequency (%) |
| . | 447 | |
| , | 446 | |
| " | 56 | 5.9% |
| ' | 4 | 0.4% |
Open Punctuation
| Value | Count | Frequency (%) |
| ( | 68 |
| Value | Count | Frequency (%) |
| ( | 86 |
Close Punctuation
| Value | Count | Frequency (%) |
| ) | 68 |
| Value | Count | Frequency (%) |
| ) | 86 |
Dash Punctuation
| Value | Count | Frequency (%) |
| - | 7 |
| Value | Count | Frequency (%) |
| - | 7 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 9646 | |
| Common | 2487 | 20.5% |
| Value | Count | Frequency (%) |
| Latin | 9854 | |
| Common | 2546 | 20.5% |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 1378 | ||
| . | 447 | 18.0% |
| , | 446 | 17.9% |
| ( | 68 | 2.7% |
| ) | 68 | 2.7% |
| " | 66 | 2.7% |
| - | 7 | 0.3% |
| ' | 6 | 0.2% |
| / | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 1414 | ||
| . | 447 | 17.6% |
| , | 446 | 17.5% |
| ) | 86 | 3.4% |
| ( | 86 | 3.4% |
| " | 56 | 2.2% |
| - | 7 | 0.3% |
| ' | 4 | 0.2% |
Latin
| Value | Count | Frequency (%) |
| r | 1007 | 10.4% |
| e | 898 | 9.3% |
| a | 821 | 8.5% |
| i | 680 | 7.0% |
| n | 659 | 6.8% |
| s | 650 | 6.7% |
| M | 559 | 5.8% |
| l | 514 | 5.3% |
| o | 490 | 5.1% |
| t | 340 | 3.5% |
| Other values (41) | 3028 |
| Value | Count | Frequency (%) |
| r | 1021 | 10.4% |
| a | 880 | 8.9% |
| e | 874 | 8.9% |
| i | 698 | 7.1% |
| n | 658 | 6.7% |
| s | 654 | 6.6% |
| M | 566 | 5.7% |
| l | 538 | 5.5% |
| o | 516 | 5.2% |
| t | 350 | 3.6% |
| Other values (41) | 3099 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 12133 |
| Value | Count | Frequency (%) |
| ASCII | 12400 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 1378 | 11.4% | |
| r | 1007 | 8.3% |
| e | 898 | 7.4% |
| a | 821 | 6.8% |
| i | 680 | 5.6% |
| n | 659 | 5.4% |
| s | 650 | 5.4% |
| M | 559 | 4.6% |
| l | 514 | 4.2% |
| o | 490 | 4.0% |
| Other values (50) | 4477 |
| Value | Count | Frequency (%) |
| 1414 | 11.4% | |
| r | 1021 | 8.2% |
| a | 880 | 7.1% |
| e | 874 | 7.0% |
| i | 698 | 5.6% |
| n | 658 | 5.3% |
| s | 654 | 5.3% |
| M | 566 | 4.6% |
| l | 538 | 4.3% |
| o | 516 | 4.2% |
| Other values (49) | 4581 |
Sex
Categorical
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 2 | 2 |
| Distinct (%) | 0.4% | 0.4% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
| male | |
|---|---|
| female |
| male | |
|---|---|
| female |
Length
| Dataset A | Dataset B | |
|---|---|---|
| Max length | 6 | 6 |
| Median length | 4 | 4 |
| Mean length | 4.6950673 | 4.7443946 |
| Min length | 4 | 4 |
Characters and Unicode
| Dataset A | Dataset B | |
|---|---|---|
| Total characters | 2094 | 2116 |
| Distinct characters | 5 | 5 |
| Distinct categories | 1 | 1 ? |
| Distinct scripts | 1 | 1 ? |
| Distinct blocks | 1 | 1 ? |
Unique
| Dataset A | Dataset B | |
|---|---|---|
| Unique | 0 | 0 ? |
| Unique (%) | 0.0% | 0.0% |
Sample
| Dataset A | Dataset B | |
|---|---|---|
| 1st row | male | male |
| 2nd row | male | female |
| 3rd row | male | male |
| 4th row | male | male |
| 5th row | female | female |
Common Values
| Value | Count | Frequency (%) |
| male | 291 | |
| female | 155 |
| Value | Count | Frequency (%) |
| male | 280 | |
| female | 166 |
Length
Common Values (Plot)
Dataset A
Dataset B
| Value | Count | Frequency (%) |
| male | 291 | |
| female | 155 |
| Value | Count | Frequency (%) |
| male | 280 | |
| female | 166 |
Most occurring characters
| Value | Count | Frequency (%) |
| e | 601 | |
| m | 446 | |
| a | 446 | |
| l | 446 | |
| f | 155 | 7.4% |
| Value | Count | Frequency (%) |
| e | 612 | |
| m | 446 | |
| a | 446 | |
| l | 446 | |
| f | 166 | 7.8% |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 2094 |
| Value | Count | Frequency (%) |
| Lowercase Letter | 2116 |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| e | 601 | |
| m | 446 | |
| a | 446 | |
| l | 446 | |
| f | 155 | 7.4% |
| Value | Count | Frequency (%) |
| e | 612 | |
| m | 446 | |
| a | 446 | |
| l | 446 | |
| f | 166 | 7.8% |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 2094 |
| Value | Count | Frequency (%) |
| Latin | 2116 |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| e | 601 | |
| m | 446 | |
| a | 446 | |
| l | 446 | |
| f | 155 | 7.4% |
| Value | Count | Frequency (%) |
| e | 612 | |
| m | 446 | |
| a | 446 | |
| l | 446 | |
| f | 166 | 7.8% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 2094 |
| Value | Count | Frequency (%) |
| ASCII | 2116 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| e | 601 | |
| m | 446 | |
| a | 446 | |
| l | 446 | |
| f | 155 | 7.4% |
| Value | Count | Frequency (%) |
| e | 612 | |
| m | 446 | |
| a | 446 | |
| l | 446 | |
| f | 166 | 7.8% |
Age
Real number (ℝ)
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 77 | 72 |
| Distinct (%) | 21.6% | 19.8% |
| Missing | 89 | 83 |
| Missing (%) | 20.0% | 18.6% |
| Infinite | 0 | 0 |
| Infinite (%) | 0.0% | 0.0% |
| Mean | 30.441653 | 29.488044 |
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 0.42 | 0.75 |
| Maximum | 80 | 74 |
| Zeros | 0 | 0 |
| Zeros (%) | 0.0% | 0.0% |
| Negative | 0 | 0 |
| Negative (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
Quantile statistics
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 0.42 | 0.75 |
| 5-th percentile | 5.8 | 4.1 |
| Q1 | 21 | 20 |
| median | 29 | 28 |
| Q3 | 39 | 38 |
| 95-th percentile | 56.2 | 55.35 |
| Maximum | 80 | 74 |
| Range | 79.58 | 73.25 |
| Interquartile range (IQR) | 18 | 18 |
Descriptive statistics
| Dataset A | Dataset B | |
|---|---|---|
| Standard deviation | 14.568429 | 14.450168 |
| Coefficient of variation (CV) | 0.47856892 | 0.4900348 |
| Kurtosis | 0.37075641 | 0.048340959 |
| Mean | 30.441653 | 29.488044 |
| Median Absolute Deviation (MAD) | 9 | 9 |
| Skewness | 0.45680885 | 0.36328728 |
| Sum | 10867.67 | 10704.16 |
| Variance | 212.23912 | 208.80735 |
| Monotonicity | Not monotonic | Not monotonic |
| Value | Count | Frequency (%) |
| 28 | 16 | 3.6% |
| 22 | 14 | 3.1% |
| 25 | 14 | 3.1% |
| 19 | 14 | 3.1% |
| 24 | 14 | 3.1% |
| 18 | 12 | 2.7% |
| 30 | 12 | 2.7% |
| 31 | 11 | 2.5% |
| 36 | 11 | 2.5% |
| 21 | 11 | 2.5% |
| Other values (67) | 228 | |
| (Missing) | 89 | 20.0% |
| Value | Count | Frequency (%) |
| 24 | 15 | 3.4% |
| 28 | 15 | 3.4% |
| 22 | 15 | 3.4% |
| 19 | 14 | 3.1% |
| 30 | 14 | 3.1% |
| 25 | 14 | 3.1% |
| 36 | 12 | 2.7% |
| 18 | 12 | 2.7% |
| 21 | 12 | 2.7% |
| 26 | 11 | 2.5% |
| Other values (62) | 229 | |
| (Missing) | 83 | 18.6% |
| Value | Count | Frequency (%) |
| 0.42 | 1 | 0.2% |
| 0.83 | 1 | 0.2% |
| 0.92 | 1 | 0.2% |
| 1 | 1 | 0.2% |
| 2 | 4 | |
| 3 | 4 | |
| 4 | 5 | |
| 5 | 1 | 0.2% |
| 6 | 1 | 0.2% |
| 7 | 1 | 0.2% |
| Value | Count | Frequency (%) |
| 0.75 | 2 | 0.4% |
| 0.83 | 2 | 0.4% |
| 1 | 3 | |
| 2 | 6 | |
| 3 | 1 | 0.2% |
| 4 | 5 | |
| 5 | 1 | 0.2% |
| 6 | 3 | |
| 7 | 1 | 0.2% |
| 8 | 2 | 0.4% |
| Value | Count | Frequency (%) |
| 0.75 | 2 | 0.4% |
| 0.83 | 2 | 0.4% |
| 1 | 3 | |
| 2 | 6 | |
| 3 | 1 | 0.2% |
| 4 | 5 | |
| 5 | 1 | 0.2% |
| 6 | 3 | |
| 7 | 1 | 0.2% |
| 8 | 2 | 0.4% |
| Value | Count | Frequency (%) |
| 0.42 | 1 | 0.2% |
| 0.83 | 1 | 0.2% |
| 0.92 | 1 | 0.2% |
| 1 | 1 | 0.2% |
| 2 | 4 | |
| 3 | 4 | |
| 4 | 5 | |
| 5 | 1 | 0.2% |
| 6 | 1 | 0.2% |
| 7 | 1 | 0.2% |
SibSp
Real number (ℝ)
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 7 | 7 |
| Distinct (%) | 1.6% | 1.6% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Infinite | 0 | 0 |
| Infinite (%) | 0.0% | 0.0% |
| Mean | 0.54035874 | 0.57847534 |
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 0 | 0 |
| Maximum | 8 | 8 |
| Zeros | 299 | 296 |
| Zeros (%) | 67.0% | 66.4% |
| Negative | 0 | 0 |
| Negative (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
Quantile statistics
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 0 | 0 |
| 5-th percentile | 0 | 0 |
| Q1 | 0 | 0 |
| median | 0 | 0 |
| Q3 | 1 | 1 |
| 95-th percentile | 2 | 3 |
| Maximum | 8 | 8 |
| Range | 8 | 8 |
| Interquartile range (IQR) | 1 | 1 |
Descriptive statistics
| Dataset A | Dataset B | |
|---|---|---|
| Standard deviation | 1.1205672 | 1.2111406 |
| Coefficient of variation (CV) | 2.0737468 | 2.0936771 |
| Kurtosis | 18.403109 | 16.227935 |
| Mean | 0.54035874 | 0.57847534 |
| Median Absolute Deviation (MAD) | 0 | 0 |
| Skewness | 3.7392648 | 3.6121318 |
| Sum | 241 | 258 |
| Variance | 1.2556709 | 1.4668615 |
| Monotonicity | Not monotonic | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 299 | |
| 1 | 108 | 24.2% |
| 2 | 17 | 3.8% |
| 4 | 9 | 2.0% |
| 3 | 7 | 1.6% |
| 8 | 4 | 0.9% |
| 5 | 2 | 0.4% |
| Value | Count | Frequency (%) |
| 0 | 296 | |
| 1 | 111 | 24.9% |
| 2 | 12 | 2.7% |
| 4 | 11 | 2.5% |
| 3 | 8 | 1.8% |
| 8 | 5 | 1.1% |
| 5 | 3 | 0.7% |
| Value | Count | Frequency (%) |
| 0 | 299 | |
| 1 | 108 | 24.2% |
| 2 | 17 | 3.8% |
| 3 | 7 | 1.6% |
| 4 | 9 | 2.0% |
| 5 | 2 | 0.4% |
| 8 | 4 | 0.9% |
| Value | Count | Frequency (%) |
| 0 | 296 | |
| 1 | 111 | 24.9% |
| 2 | 12 | 2.7% |
| 3 | 8 | 1.8% |
| 4 | 11 | 2.5% |
| 5 | 3 | 0.7% |
| 8 | 5 | 1.1% |
| Value | Count | Frequency (%) |
| 0 | 296 | |
| 1 | 111 | 24.9% |
| 2 | 12 | 2.7% |
| 3 | 8 | 1.8% |
| 4 | 11 | 2.5% |
| 5 | 3 | 0.7% |
| 8 | 5 | 1.1% |
| Value | Count | Frequency (%) |
| 0 | 299 | |
| 1 | 108 | 24.2% |
| 2 | 17 | 3.8% |
| 3 | 7 | 1.6% |
| 4 | 9 | 2.0% |
| 5 | 2 | 0.4% |
| 8 | 4 | 0.9% |
Parch
Real number (ℝ)
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 7 | 7 |
| Distinct (%) | 1.6% | 1.6% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Infinite | 0 | 0 |
| Infinite (%) | 0.0% | 0.0% |
| Mean | 0.39237668 | 0.41479821 |
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 0 | 0 |
| Maximum | 6 | 6 |
| Zeros | 340 | 329 |
| Zeros (%) | 76.2% | 73.8% |
| Negative | 0 | 0 |
| Negative (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
Quantile statistics
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 0 | 0 |
| 5-th percentile | 0 | 0 |
| Q1 | 0 | 0 |
| median | 0 | 0 |
| Q3 | 0 | 1 |
| 95-th percentile | 2 | 2 |
| Maximum | 6 | 6 |
| Range | 6 | 6 |
| Interquartile range (IQR) | 0 | 1 |
Descriptive statistics
| Dataset A | Dataset B | |
|---|---|---|
| Standard deviation | 0.85896996 | 0.81868946 |
| Coefficient of variation (CV) | 2.1891463 | 1.9737054 |
| Kurtosis | 11.593563 | 9.3096702 |
| Mean | 0.39237668 | 0.41479821 |
| Median Absolute Deviation (MAD) | 0 | 0 |
| Skewness | 3.0187772 | 2.5834683 |
| Sum | 175 | 185 |
| Variance | 0.73782939 | 0.67025243 |
| Monotonicity | Not monotonic | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 340 | |
| 1 | 59 | 13.2% |
| 2 | 38 | 8.5% |
| 5 | 4 | 0.9% |
| 3 | 2 | 0.4% |
| 4 | 2 | 0.4% |
| 6 | 1 | 0.2% |
| Value | Count | Frequency (%) |
| 0 | 329 | |
| 1 | 63 | 14.1% |
| 2 | 48 | 10.8% |
| 5 | 2 | 0.4% |
| 3 | 2 | 0.4% |
| 4 | 1 | 0.2% |
| 6 | 1 | 0.2% |
| Value | Count | Frequency (%) |
| 0 | 340 | |
| 1 | 59 | 13.2% |
| 2 | 38 | 8.5% |
| 3 | 2 | 0.4% |
| 4 | 2 | 0.4% |
| 5 | 4 | 0.9% |
| 6 | 1 | 0.2% |
| Value | Count | Frequency (%) |
| 0 | 329 | |
| 1 | 63 | 14.1% |
| 2 | 48 | 10.8% |
| 3 | 2 | 0.4% |
| 4 | 1 | 0.2% |
| 5 | 2 | 0.4% |
| 6 | 1 | 0.2% |
| Value | Count | Frequency (%) |
| 0 | 329 | |
| 1 | 63 | 14.1% |
| 2 | 48 | 10.8% |
| 3 | 2 | 0.4% |
| 4 | 1 | 0.2% |
| 5 | 2 | 0.4% |
| 6 | 1 | 0.2% |
| Value | Count | Frequency (%) |
| 0 | 340 | |
| 1 | 59 | 13.2% |
| 2 | 38 | 8.5% |
| 3 | 2 | 0.4% |
| 4 | 2 | 0.4% |
| 5 | 4 | 0.9% |
| 6 | 1 | 0.2% |
Ticket
['Text', 'Text']
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 384 | 376 |
| Distinct (%) | 86.1% | 84.3% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
Length
| Dataset A | Dataset B | |
|---|---|---|
| Max length | 18 | 18 |
| Median length | 17 | 17 |
| Mean length | 6.6726457 | 6.706278 |
| Min length | 3 | 3 |
Characters and Unicode
| Dataset A | Dataset B | |
|---|---|---|
| Total characters | 2976 | 2991 |
| Distinct characters | 35 | 34 |
| Distinct categories | 5 | 5 ? |
| Distinct scripts | 2 | 2 ? |
| Distinct blocks | 1 | 1 ? |
Unique
| Dataset A | Dataset B | |
|---|---|---|
| Unique | 337 | 326 ? |
| Unique (%) | 75.6% | 73.1% |
Sample
| Dataset A | Dataset B | |
|---|---|---|
| 1st row | 364511 | STON/O 2. 3101269 |
| 2nd row | 315151 | W./C. 14258 |
| 3rd row | 36865 | 17453 |
| 4th row | PC 17582 | 113055 |
| 5th row | 330958 | 240929 |
| Value | Count | Frequency (%) |
| pc | 34 | 6.0% |
| c.a | 14 | 2.5% |
| ca | 7 | 1.2% |
| a/5 | 7 | 1.2% |
| w./c | 6 | 1.1% |
| 382652 | 5 | 0.9% |
| 14879 | 5 | 0.9% |
| sc/paris | 5 | 0.9% |
| s.o.c | 5 | 0.9% |
| f.c.c | 4 | 0.7% |
| Other values (399) | 472 |
| Value | Count | Frequency (%) |
| pc | 32 | 5.7% |
| ca | 9 | 1.6% |
| 2 | 8 | 1.4% |
| ston/o | 8 | 1.4% |
| c.a | 8 | 1.4% |
| a/5 | 7 | 1.2% |
| sc/paris | 6 | 1.1% |
| w./c | 5 | 0.9% |
| a/4 | 5 | 0.9% |
| 3101295 | 5 | 0.9% |
| Other values (398) | 472 |
Most occurring characters
| Value | Count | Frequency (%) |
| 3 | 375 | |
| 1 | 330 | |
| 2 | 297 | |
| 7 | 244 | |
| 4 | 235 | |
| 6 | 209 | 7.0% |
| 0 | 209 | 7.0% |
| 5 | 196 | 6.6% |
| 9 | 169 | 5.7% |
| 8 | 144 | 4.8% |
| Other values (25) | 568 |
| Value | Count | Frequency (%) |
| 3 | 361 | |
| 1 | 349 | |
| 2 | 296 | |
| 4 | 254 | |
| 7 | 237 | |
| 6 | 216 | 7.2% |
| 0 | 204 | 6.8% |
| 5 | 184 | 6.2% |
| 9 | 166 | 5.5% |
| 8 | 143 | 4.8% |
| Other values (24) | 581 |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 2408 | |
| Uppercase Letter | 293 | 9.8% |
| Other Punctuation | 144 | 4.8% |
| Space Separator | 118 | 4.0% |
| Lowercase Letter | 13 | 0.4% |
| Value | Count | Frequency (%) |
| Decimal Number | 2410 | |
| Uppercase Letter | 310 | 10.4% |
| Other Punctuation | 135 | 4.5% |
| Space Separator | 119 | 4.0% |
| Lowercase Letter | 17 | 0.6% |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 3 | 375 | |
| 1 | 330 | |
| 2 | 297 | |
| 7 | 244 | |
| 4 | 235 | |
| 6 | 209 | |
| 0 | 209 | |
| 5 | 196 | |
| 9 | 169 | |
| 8 | 144 | 6.0% |
| Value | Count | Frequency (%) |
| 3 | 361 | |
| 1 | 349 | |
| 2 | 296 | |
| 4 | 254 | |
| 7 | 237 | |
| 6 | 216 | |
| 0 | 204 | |
| 5 | 184 | |
| 9 | 166 | |
| 8 | 143 | 5.9% |
Space Separator
| Value | Count | Frequency (%) |
| 118 |
| Value | Count | Frequency (%) |
| 119 |
Other Punctuation
| Value | Count | Frequency (%) |
| . | 103 | |
| / | 41 | 28.5% |
| Value | Count | Frequency (%) |
| . | 86 | |
| / | 49 |
Uppercase Letter
| Value | Count | Frequency (%) |
| C | 86 | |
| P | 44 | |
| A | 43 | |
| O | 32 | 10.9% |
| S | 31 | 10.6% |
| N | 13 | 4.4% |
| T | 11 | 3.8% |
| W | 8 | 2.7% |
| F | 5 | 1.7% |
| I | 5 | 1.7% |
| Other values (6) | 15 | 5.1% |
| Value | Count | Frequency (%) |
| C | 72 | |
| P | 55 | |
| O | 44 | |
| A | 39 | |
| S | 38 | |
| T | 16 | 5.2% |
| N | 16 | 5.2% |
| W | 10 | 3.2% |
| Q | 5 | 1.6% |
| R | 4 | 1.3% |
| Other values (5) | 11 | 3.5% |
Lowercase Letter
| Value | Count | Frequency (%) |
| a | 4 | |
| s | 3 | |
| i | 2 | |
| r | 2 | |
| l | 1 | 7.7% |
| e | 1 | 7.7% |
| Value | Count | Frequency (%) |
| a | 5 | |
| s | 4 | |
| i | 3 | |
| r | 3 | |
| l | 1 | 5.9% |
| e | 1 | 5.9% |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 2670 | |
| Latin | 306 | 10.3% |
| Value | Count | Frequency (%) |
| Common | 2664 | |
| Latin | 327 | 10.9% |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 3 | 375 | |
| 1 | 330 | |
| 2 | 297 | |
| 7 | 244 | |
| 4 | 235 | |
| 6 | 209 | |
| 0 | 209 | |
| 5 | 196 | |
| 9 | 169 | |
| 8 | 144 | 5.4% |
| Other values (3) | 262 |
| Value | Count | Frequency (%) |
| 3 | 361 | |
| 1 | 349 | |
| 2 | 296 | |
| 4 | 254 | |
| 7 | 237 | |
| 6 | 216 | |
| 0 | 204 | |
| 5 | 184 | |
| 9 | 166 | |
| 8 | 143 | 5.4% |
| Other values (3) | 254 |
Latin
| Value | Count | Frequency (%) |
| C | 86 | |
| P | 44 | |
| A | 43 | |
| O | 32 | 10.5% |
| S | 31 | 10.1% |
| N | 13 | 4.2% |
| T | 11 | 3.6% |
| W | 8 | 2.6% |
| F | 5 | 1.6% |
| I | 5 | 1.6% |
| Other values (12) | 28 | 9.2% |
| Value | Count | Frequency (%) |
| C | 72 | |
| P | 55 | |
| O | 44 | |
| A | 39 | |
| S | 38 | |
| T | 16 | 4.9% |
| N | 16 | 4.9% |
| W | 10 | 3.1% |
| a | 5 | 1.5% |
| Q | 5 | 1.5% |
| Other values (11) | 27 | 8.3% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 2976 |
| Value | Count | Frequency (%) |
| ASCII | 2991 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 3 | 375 | |
| 1 | 330 | |
| 2 | 297 | |
| 7 | 244 | |
| 4 | 235 | |
| 6 | 209 | 7.0% |
| 0 | 209 | 7.0% |
| 5 | 196 | 6.6% |
| 9 | 169 | 5.7% |
| 8 | 144 | 4.8% |
| Other values (25) | 568 |
| Value | Count | Frequency (%) |
| 3 | 361 | |
| 1 | 349 | |
| 2 | 296 | |
| 4 | 254 | |
| 7 | 237 | |
| 6 | 216 | 7.2% |
| 0 | 204 | 6.8% |
| 5 | 184 | 6.2% |
| 9 | 166 | 5.5% |
| 8 | 143 | 4.8% |
| Other values (24) | 581 |
Fare
Real number (ℝ)
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 182 | 181 |
| Distinct (%) | 40.8% | 40.6% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Infinite | 0 | 0 |
| Infinite (%) | 0.0% | 0.0% |
| Mean | 30.867049 | 33.45979 |
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 0 | 0 |
| Maximum | 263 | 512.3292 |
| Zeros | 7 | 6 |
| Zeros (%) | 1.6% | 1.3% |
| Negative | 0 | 0 |
| Negative (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
Quantile statistics
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 0 | 0 |
| 5-th percentile | 7.2292 | 7.225 |
| Q1 | 8.0344 | 8.05 |
| median | 15.5 | 15.3729 |
| Q3 | 30.3927 | 34.03125 |
| 95-th percentile | 112.18125 | 112.67708 |
| Maximum | 263 | 512.3292 |
| Range | 263 | 512.3292 |
| Interquartile range (IQR) | 22.3583 | 25.98125 |
Descriptive statistics
| Dataset A | Dataset B | |
|---|---|---|
| Standard deviation | 41.324374 | 50.624524 |
| Coefficient of variation (CV) | 1.338786 | 1.5129959 |
| Kurtosis | 12.259916 | 37.88923 |
| Mean | 30.867049 | 33.45979 |
| Median Absolute Deviation (MAD) | 7.75 | 7.7979 |
| Skewness | 3.2356587 | 5.070449 |
| Sum | 13766.704 | 14923.066 |
| Variance | 1707.7039 | 2562.8424 |
| Monotonicity | Not monotonic | Not monotonic |
| Value | Count | Frequency (%) |
| 8.05 | 22 | 4.9% |
| 13 | 22 | 4.9% |
| 7.8958 | 21 | 4.7% |
| 26 | 19 | 4.3% |
| 7.75 | 16 | 3.6% |
| 10.5 | 11 | 2.5% |
| 26.55 | 9 | 2.0% |
| 7.925 | 8 | 1.8% |
| 7.775 | 7 | 1.6% |
| 0 | 7 | 1.6% |
| Other values (172) | 304 |
| Value | Count | Frequency (%) |
| 13 | 26 | 5.8% |
| 8.05 | 23 | 5.2% |
| 7.8958 | 16 | 3.6% |
| 7.75 | 15 | 3.4% |
| 26 | 15 | 3.4% |
| 10.5 | 10 | 2.2% |
| 26.55 | 9 | 2.0% |
| 7.2292 | 7 | 1.6% |
| 7.775 | 7 | 1.6% |
| 7.225 | 7 | 1.6% |
| Other values (171) | 311 |
| Value | Count | Frequency (%) |
| 0 | 7 | |
| 4.0125 | 1 | 0.2% |
| 6.45 | 1 | 0.2% |
| 6.75 | 1 | 0.2% |
| 7.05 | 2 | 0.4% |
| 7.0542 | 1 | 0.2% |
| 7.125 | 1 | 0.2% |
| 7.225 | 4 | |
| 7.2292 | 6 | |
| 7.25 | 6 |
| Value | Count | Frequency (%) |
| 0 | 6 | |
| 4.0125 | 1 | 0.2% |
| 6.2375 | 1 | 0.2% |
| 6.4958 | 2 | 0.4% |
| 6.75 | 2 | 0.4% |
| 7.0458 | 1 | 0.2% |
| 7.05 | 2 | 0.4% |
| 7.125 | 3 | |
| 7.225 | 7 | |
| 7.2292 | 7 |
| Value | Count | Frequency (%) |
| 0 | 6 | |
| 4.0125 | 1 | 0.2% |
| 6.2375 | 1 | 0.2% |
| 6.4958 | 2 | 0.4% |
| 6.75 | 2 | 0.4% |
| 7.0458 | 1 | 0.2% |
| 7.05 | 2 | 0.4% |
| 7.125 | 3 | |
| 7.225 | 7 | |
| 7.2292 | 7 |
| Value | Count | Frequency (%) |
| 0 | 7 | |
| 4.0125 | 1 | 0.2% |
| 6.45 | 1 | 0.2% |
| 6.75 | 1 | 0.2% |
| 7.05 | 2 | 0.4% |
| 7.0542 | 1 | 0.2% |
| 7.125 | 1 | 0.2% |
| 7.225 | 4 | |
| 7.2292 | 6 | |
| 7.25 | 6 |
Cabin
['Text', 'Text']
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 84 | 92 |
| Distinct (%) | 90.3% | 88.5% |
| Missing | 353 | 342 |
| Missing (%) | 79.1% | 76.7% |
| Memory size | 7.0 KiB | 7.0 KiB |
Length
| Dataset A | Dataset B | |
|---|---|---|
| Max length | 15 | 11 |
| Median length | 3 | 3 |
| Mean length | 3.5591398 | 3.4326923 |
| Min length | 1 | 1 |
Characters and Unicode
| Dataset A | Dataset B | |
|---|---|---|
| Total characters | 331 | 357 |
| Distinct characters | 19 | 18 |
| Distinct categories | 3 | 3 ? |
| Distinct scripts | 2 | 2 ? |
| Distinct blocks | 1 | 1 ? |
Unique
| Dataset A | Dataset B | |
|---|---|---|
| Unique | 75 | 82 ? |
| Unique (%) | 80.6% | 78.8% |
Sample
| Dataset A | Dataset B | |
|---|---|---|
| 1st row | C91 | C92 |
| 2nd row | C7 | E17 |
| 3rd row | C99 | B79 |
| 4th row | E40 | E12 |
| 5th row | F2 | B50 |
| Value | Count | Frequency (%) |
| d | 2 | 1.9% |
| b63 | 2 | 1.9% |
| c93 | 2 | 1.9% |
| d36 | 2 | 1.9% |
| b96 | 2 | 1.9% |
| c52 | 2 | 1.9% |
| b66 | 2 | 1.9% |
| b98 | 2 | 1.9% |
| b59 | 2 | 1.9% |
| b57 | 2 | 1.9% |
| Other values (84) | 87 |
| Value | Count | Frequency (%) |
| d | 3 | 2.6% |
| g6 | 3 | 2.6% |
| f | 3 | 2.6% |
| g73 | 2 | 1.7% |
| c126 | 2 | 1.7% |
| b98 | 2 | 1.7% |
| b96 | 2 | 1.7% |
| d33 | 2 | 1.7% |
| b49 | 2 | 1.7% |
| b77 | 2 | 1.7% |
| Other values (92) | 94 |
Most occurring characters
| Value | Count | Frequency (%) |
| C | 33 | 10.0% |
| 3 | 32 | 9.7% |
| 2 | 30 | 9.1% |
| B | 29 | 8.8% |
| 6 | 24 | 7.3% |
| 1 | 22 | 6.6% |
| 5 | 21 | 6.3% |
| 8 | 20 | 6.0% |
| 9 | 18 | 5.4% |
| D | 16 | 4.8% |
| Other values (9) | 86 |
| Value | Count | Frequency (%) |
| 1 | 40 | |
| C | 34 | 9.5% |
| 2 | 31 | 8.7% |
| B | 29 | 8.1% |
| 3 | 27 | 7.6% |
| 6 | 22 | 6.2% |
| D | 21 | 5.9% |
| 8 | 19 | 5.3% |
| 0 | 19 | 5.3% |
| 5 | 19 | 5.3% |
| Other values (8) | 96 |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 210 | |
| Uppercase Letter | 107 | |
| Space Separator | 14 | 4.2% |
| Value | Count | Frequency (%) |
| Decimal Number | 227 | |
| Uppercase Letter | 117 | |
| Space Separator | 13 | 3.6% |
Most frequent character per category
Uppercase Letter
| Value | Count | Frequency (%) |
| C | 33 | |
| B | 29 | |
| D | 16 | |
| E | 13 | 12.1% |
| A | 10 | 9.3% |
| F | 4 | 3.7% |
| G | 1 | 0.9% |
| T | 1 | 0.9% |
| Value | Count | Frequency (%) |
| C | 34 | |
| B | 29 | |
| D | 21 | |
| E | 16 | |
| F | 6 | 5.1% |
| G | 6 | 5.1% |
| A | 5 | 4.3% |
Decimal Number
| Value | Count | Frequency (%) |
| 3 | 32 | |
| 2 | 30 | |
| 6 | 24 | |
| 1 | 22 | |
| 5 | 21 | |
| 8 | 20 | |
| 9 | 18 | |
| 4 | 16 | |
| 0 | 14 | |
| 7 | 13 |
| Value | Count | Frequency (%) |
| 1 | 40 | |
| 2 | 31 | |
| 3 | 27 | |
| 6 | 22 | |
| 8 | 19 | |
| 0 | 19 | |
| 5 | 19 | |
| 9 | 17 | |
| 7 | 17 | |
| 4 | 16 | 7.0% |
Space Separator
| Value | Count | Frequency (%) |
| 14 |
| Value | Count | Frequency (%) |
| 13 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 224 | |
| Latin | 107 |
| Value | Count | Frequency (%) |
| Common | 240 | |
| Latin | 117 |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| C | 33 | |
| B | 29 | |
| D | 16 | |
| E | 13 | 12.1% |
| A | 10 | 9.3% |
| F | 4 | 3.7% |
| G | 1 | 0.9% |
| T | 1 | 0.9% |
| Value | Count | Frequency (%) |
| C | 34 | |
| B | 29 | |
| D | 21 | |
| E | 16 | |
| F | 6 | 5.1% |
| G | 6 | 5.1% |
| A | 5 | 4.3% |
Common
| Value | Count | Frequency (%) |
| 3 | 32 | |
| 2 | 30 | |
| 6 | 24 | |
| 1 | 22 | |
| 5 | 21 | |
| 8 | 20 | |
| 9 | 18 | |
| 4 | 16 | |
| 0 | 14 | |
| 14 |
| Value | Count | Frequency (%) |
| 1 | 40 | |
| 2 | 31 | |
| 3 | 27 | |
| 6 | 22 | |
| 8 | 19 | |
| 0 | 19 | |
| 5 | 19 | |
| 9 | 17 | |
| 7 | 17 | |
| 4 | 16 | 6.7% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 331 |
| Value | Count | Frequency (%) |
| ASCII | 357 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| C | 33 | 10.0% |
| 3 | 32 | 9.7% |
| 2 | 30 | 9.1% |
| B | 29 | 8.8% |
| 6 | 24 | 7.3% |
| 1 | 22 | 6.6% |
| 5 | 21 | 6.3% |
| 8 | 20 | 6.0% |
| 9 | 18 | 5.4% |
| D | 16 | 4.8% |
| Other values (9) | 86 |
| Value | Count | Frequency (%) |
| 1 | 40 | |
| C | 34 | 9.5% |
| 2 | 31 | 8.7% |
| B | 29 | 8.1% |
| 3 | 27 | 7.6% |
| 6 | 22 | 6.2% |
| D | 21 | 5.9% |
| 8 | 19 | 5.3% |
| 0 | 19 | 5.3% |
| 5 | 19 | 5.3% |
| Other values (8) | 96 |
Embarked
Categorical
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 3 | 3 |
| Distinct (%) | 0.7% | 0.7% |
| Missing | 1 | 1 |
| Missing (%) | 0.2% | 0.2% |
| Memory size | 7.0 KiB | 7.0 KiB |
| S | |
|---|---|
| C | |
| Q |
| S | |
|---|---|
| C | |
| Q | 30 |
Length
| Dataset A | Dataset B | |
|---|---|---|
| Max length | 1 | 1 |
| Median length | 1 | 1 |
| Mean length | 1 | 1 |
| Min length | 1 | 1 |
Characters and Unicode
| Dataset A | Dataset B | |
|---|---|---|
| Total characters | 445 | 445 |
| Distinct characters | 3 | 3 |
| Distinct categories | 1 | 1 ? |
| Distinct scripts | 1 | 1 ? |
| Distinct blocks | 1 | 1 ? |
Unique
| Dataset A | Dataset B | |
|---|---|---|
| Unique | 0 | 0 ? |
| Unique (%) | 0.0% | 0.0% |
Sample
| Dataset A | Dataset B | |
|---|---|---|
| 1st row | S | S |
| 2nd row | S | S |
| 3rd row | Q | C |
| 4th row | S | S |
| 5th row | Q | S |
Common Values
| Value | Count | Frequency (%) |
| S | 322 | |
| C | 78 | 17.5% |
| Q | 45 | 10.1% |
| (Missing) | 1 | 0.2% |
| Value | Count | Frequency (%) |
| S | 325 | |
| C | 90 | 20.2% |
| Q | 30 | 6.7% |
| (Missing) | 1 | 0.2% |
Length
Common Values (Plot)
Dataset A
Dataset B
| Value | Count | Frequency (%) |
| s | 322 | |
| c | 78 | 17.5% |
| q | 45 | 10.1% |
| Value | Count | Frequency (%) |
| s | 325 | |
| c | 90 | 20.2% |
| q | 30 | 6.7% |
Most occurring characters
| Value | Count | Frequency (%) |
| S | 322 | |
| C | 78 | 17.5% |
| Q | 45 | 10.1% |
| Value | Count | Frequency (%) |
| S | 325 | |
| C | 90 | 20.2% |
| Q | 30 | 6.7% |
Most occurring categories
| Value | Count | Frequency (%) |
| Uppercase Letter | 445 |
| Value | Count | Frequency (%) |
| Uppercase Letter | 445 |
Most frequent character per category
Uppercase Letter
| Value | Count | Frequency (%) |
| S | 322 | |
| C | 78 | 17.5% |
| Q | 45 | 10.1% |
| Value | Count | Frequency (%) |
| S | 325 | |
| C | 90 | 20.2% |
| Q | 30 | 6.7% |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 445 |
| Value | Count | Frequency (%) |
| Latin | 445 |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| S | 322 | |
| C | 78 | 17.5% |
| Q | 45 | 10.1% |
| Value | Count | Frequency (%) |
| S | 325 | |
| C | 90 | 20.2% |
| Q | 30 | 6.7% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 445 |
| Value | Count | Frequency (%) |
| ASCII | 445 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| S | 322 | |
| C | 78 | 17.5% |
| Q | 45 | 10.1% |
| Value | Count | Frequency (%) |
| S | 325 | |
| C | 90 | 20.2% |
| Q | 30 | 6.7% |
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
| PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 603 | 604 | 0 | 3 | Torber, Mr. Ernst William | male | 44.0 | 0 | 0 | 364511 | 8.0500 | NaN | S |
| 69 | 70 | 0 | 3 | Kink, Mr. Vincenz | male | 26.0 | 2 | 0 | 315151 | 8.6625 | NaN | S |
| 778 | 779 | 0 | 3 | Kilgannon, Mr. Thomas J | male | NaN | 0 | 0 | 36865 | 7.7375 | NaN | Q |
| 332 | 333 | 0 | 1 | Graham, Mr. George Edward | male | 38.0 | 0 | 1 | PC 17582 | 153.4625 | C91 | S |
| 44 | 45 | 1 | 3 | Devaney, Miss. Margaret Delia | female | 19.0 | 0 | 0 | 330958 | 7.8792 | NaN | Q |
| 153 | 154 | 0 | 3 | van Billiard, Mr. Austin Blyler | male | 40.5 | 0 | 2 | A/5. 851 | 14.5000 | NaN | S |
| 561 | 562 | 0 | 3 | Sivic, Mr. Husein | male | 40.0 | 0 | 0 | 349251 | 7.8958 | NaN | S |
| 2 | 3 | 1 | 3 | Heikkinen, Miss. Laina | female | 26.0 | 0 | 0 | STON/O2. 3101282 | 7.9250 | NaN | S |
| 832 | 833 | 0 | 3 | Saad, Mr. Amin | male | NaN | 0 | 0 | 2671 | 7.2292 | NaN | C |
| 318 | 319 | 1 | 1 | Wick, Miss. Mary Natalie | female | 31.0 | 0 | 2 | 36928 | 164.8667 | C7 | S |
Dataset B
| PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 414 | 415 | 1 | 3 | Sundman, Mr. Johan Julian | male | 44.00 | 0 | 0 | STON/O 2. 3101269 | 7.9250 | NaN | S |
| 526 | 527 | 1 | 2 | Ridsdale, Miss. Lucy | female | 50.00 | 0 | 0 | W./C. 14258 | 10.5000 | NaN | S |
| 453 | 454 | 1 | 1 | Goldenberg, Mr. Samuel L | male | 49.00 | 1 | 0 | 17453 | 89.1042 | C92 | C |
| 857 | 858 | 1 | 1 | Daly, Mr. Peter Denis | male | 51.00 | 0 | 0 | 113055 | 26.5500 | E17 | S |
| 399 | 400 | 1 | 2 | Trout, Mrs. William H (Jessie L) | female | 28.00 | 0 | 0 | 240929 | 12.6500 | NaN | S |
| 320 | 321 | 0 | 3 | Dennis, Mr. Samuel | male | 22.00 | 0 | 0 | A/5 21172 | 7.2500 | NaN | S |
| 504 | 505 | 1 | 1 | Maioni, Miss. Roberta | female | 16.00 | 0 | 0 | 110152 | 86.5000 | B79 | S |
| 831 | 832 | 1 | 2 | Richards, Master. George Sibley | male | 0.83 | 1 | 1 | 29106 | 18.7500 | NaN | S |
| 111 | 112 | 0 | 3 | Zabour, Miss. Hileni | female | 14.50 | 1 | 0 | 2665 | 14.4542 | NaN | C |
| 353 | 354 | 0 | 3 | Arnold-Franchi, Mr. Josef | male | 25.00 | 1 | 0 | 349237 | 17.8000 | NaN | S |
Dataset A
| PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 74 | 75 | 1 | 3 | Bing, Mr. Lee | male | 32.0 | 0 | 0 | 1601 | 56.4958 | NaN | S |
| 594 | 595 | 0 | 2 | Chapman, Mr. John Henry | male | 37.0 | 1 | 0 | SC/AH 29037 | 26.0000 | NaN | S |
| 374 | 375 | 0 | 3 | Palsson, Miss. Stina Viola | female | 3.0 | 3 | 1 | 349909 | 21.0750 | NaN | S |
| 147 | 148 | 0 | 3 | Ford, Miss. Robina Maggie "Ruby" | female | 9.0 | 2 | 2 | W./C. 6608 | 34.3750 | NaN | S |
| 722 | 723 | 0 | 2 | Gillespie, Mr. William Henry | male | 34.0 | 0 | 0 | 12233 | 13.0000 | NaN | S |
| 676 | 677 | 0 | 3 | Sawyer, Mr. Frederick Charles | male | 24.5 | 0 | 0 | 342826 | 8.0500 | NaN | S |
| 407 | 408 | 1 | 2 | Richards, Master. William Rowe | male | 3.0 | 1 | 1 | 29106 | 18.7500 | NaN | S |
| 152 | 153 | 0 | 3 | Meo, Mr. Alfonzo | male | 55.5 | 0 | 0 | A.5. 11206 | 8.0500 | NaN | S |
| 690 | 691 | 1 | 1 | Dick, Mr. Albert Adrian | male | 31.0 | 1 | 0 | 17474 | 57.0000 | B20 | S |
| 488 | 489 | 0 | 3 | Somerton, Mr. Francis William | male | 30.0 | 0 | 0 | A.5. 18509 | 8.0500 | NaN | S |
Dataset B
| PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 439 | 440 | 0 | 2 | Kvillner, Mr. Johan Henrik Johannesson | male | 31.0 | 0 | 0 | C.A. 18723 | 10.5000 | NaN | S |
| 695 | 696 | 0 | 2 | Chapman, Mr. Charles Henry | male | 52.0 | 0 | 0 | 248731 | 13.5000 | NaN | S |
| 372 | 373 | 0 | 3 | Beavan, Mr. William Thomas | male | 19.0 | 0 | 0 | 323951 | 8.0500 | NaN | S |
| 858 | 859 | 1 | 3 | Baclini, Mrs. Solomon (Latifa Qurban) | female | 24.0 | 0 | 3 | 2666 | 19.2583 | NaN | C |
| 290 | 291 | 1 | 1 | Barber, Miss. Ellen "Nellie" | female | 26.0 | 0 | 0 | 19877 | 78.8500 | NaN | S |
| 334 | 335 | 1 | 1 | Frauenthal, Mrs. Henry William (Clara Heinsheimer) | female | NaN | 1 | 0 | PC 17611 | 133.6500 | NaN | S |
| 141 | 142 | 1 | 3 | Nysten, Miss. Anna Sofia | female | 22.0 | 0 | 0 | 347081 | 7.7500 | NaN | S |
| 269 | 270 | 1 | 1 | Bissette, Miss. Amelia | female | 35.0 | 0 | 0 | PC 17760 | 135.6333 | C99 | S |
| 355 | 356 | 0 | 3 | Vanden Steen, Mr. Leo Peter | male | 28.0 | 0 | 0 | 345783 | 9.5000 | NaN | S |
| 288 | 289 | 1 | 2 | Hosono, Mr. Masabumi | male | 42.0 | 0 | 0 | 237798 | 13.0000 | NaN | S |
Dataset A
| PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | # duplicates | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Dataset does not contain duplicate rows. | |||||||||||||
Dataset B
| PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | # duplicates | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Dataset does not contain duplicate rows. | |||||||||||||